A Hybrid Multi-Target Path Planning Algorithm for Unmanned Cruise Ship in an Unknown Obstacle Environment

To solve the problem of traversal multi-target path planning for an unmanned cruise ship in an unknown obstacle environment of lakes, this study proposed a hybrid multi-target path planning algorithm. The proposed algorithm can be divided into two parts. First, the multi-target path planning problem was transformed into a traveling salesman problem, and an improved Grey Wolf Optimization (GWO) algorithm was used to calculate the multi-target cruise sequence. The improved GWO algorithm optimized the convergence factor by introducing the Beta function, which can improve the convergence speed of the traditional GWO algorithm. Second, based on the planned target sequence, an improved D* Lite algorithm was used to implement the path planning between every two target points in an unknown obstacle environment. The heuristic function in the D* Lite algorithm was improved to reduce the number of expanded nodes, so the search speed was improved, and the planning path was smoothed. The proposed algorithm was verified by experiments and compared with the other four algorithms in both ordinary and complex environments. The experimental results demonstrated the strong applicability and high effectiveness of the proposed method.


Introduction
In recent years, Unmanned Cruise Ships (UCSs) for water quality sampling have been widely used in the field of water environment protection. Generally, a UCS needs to traverse multiple target points for water sampling, but there are many unknown obstacles that can move freely and dynamically change with the environment in the actual river or lake, so the UCS is required to plan an optimization path traversing multiple sample points in a short time and effectively avoid unknown obstacles to cruise safely. Therefore, multi-target path planning of a UCS in an unknown obstacle environment is of great importance [1].
Since the 1970s, many studies on the path planning problem have been conducted. The path planning methods can be roughly divided into several groups: the grid search methods, such as A* algorithm [2], Depth-First Search (DFS) [3], Breadth-first Search (BFS) [4], and Dijkstra algorithm [5]; the sampling-based methods, such as Probabilistic Roadmap (PRM) [6] and Rapidly Exploring Random Tree (RRT) [7]; heuristic or swarm intelligence algorithms, such as Genetic Algorithm (GA) [8], Ant Colony Optimization (ACO) [9], Particle Swarm Optimization (PSO) [10], and neural network-based algorithms [11]; the potential field methods, such as Artificial Potential Field (APF) [12], optimal-control based method [13], and geometry-based method [14]. The listed algorithms have certain advantages and disadvantages. For instance, the A* algorithm can perform global path planning in a short time, but it cannot be used in an unknown environment. The APF method has the advantages of easy implementation in local path search, but it can easily fall into a local minimum. It should be noted that complex path planning problems can hardly be solved using a single algorithm. Therefore, hybrid algorithms that combine the advantages of different algorithms have been proposed. For instance, the combination of the A* algorithm and the APF method can achieve path planning in unknown regions [15]. The combination of PSO and ACO can avoid the premature phenomenon of the PSO algorithm, and the convergence speed is also improved [16]. The combination of fuzzy logic algorithm and the dynamic window method can achieve unknown obstacle avoidance [17]. The abovementioned methods mostly focus on single-target path planning, but multi-target path planning is more complex. Meanwhile, there was little research on the path planning in an unknown obstacle environment.
In order to solve the problem of path planning in an unknown obstacle environment, a hybrid path planning algorithm was proposed in this paper. The main contributions of this work can be summarized as follows: (i) A hybrid algorithm that combines the advantages of the GWO algorithm and D* Lite algorithm, and thus effectively solves the multi-target path planning problem in an unknown obstacle environment, was proposed.
(ii) An improved GWO algorithm, which optimizes the convergence factor of the traditional GWO algorithm by introducing the Beta function to improve the convergence speed of the algorithm, was developed.
(iii) An improved D* Lite algorithm was proposed. By improving the algorithm's heuristic function, the expanded nodes are reduced, and the search speed in an unknown obstacle environment is improved. At the same time, the planning path is smoothed.
The rest of the paper is organized as follows. Section 2 presents the related works. Section 3 provides the preliminaries of the grid method, general GWO algorithm, and D* Lite algorithm and defines the UCS path planning problem. Section 4 presents the proposed hybrid multi-target path planning algorithm. Section 5 presents and discusses the experiment results. Finally, Section 6 concludes the paper.

Multi-Target Path Planning
In recent research, the multi-target path planning problem has been mostly regarded as a Traveling Salesman Problem (TSP). Gan et al. [18] introduced the scout strategy to solve the problems of stagnation behavior and slow the convergence speed of the ACO when it was used for the TSP. By adjusting the evaluation model and population size of the algorithm, the search time of the algorithm was shortened. Guo et al. [19] proposed a hybrid algorithm that combines the immune algorithm with the GA. This algorithm introduced a dynamic mutation operator and cross-deletion strategy to solve the TSP, and it can improve the convergence speed and accuracy of the immune algorithm. Liu et al. [20] proposed a Chaos Multi-population Particle Swarm Optimization (CMPSO) algorithm for ship path planning. This algorithm adopts a multi-population strategy to obtain a more accurate global optimal value. Considering the nonlinear dynamics of a vehicle and the dynamic constraint of the TSP, Jang et al. [21] proposed a sampling-based route map algorithm embedded in the route generation process based on optimal control, which represents an easy-to-process way to obtain the closest route planning solution. In recent years, the Grey Wolf Optimization (GWO) algorithm was used to solve the TSP [22]. Compared with the above-mentioned algorithms, such as the GA or PSO, the GWO algorithm has high solution accuracy, simple algorithm operation, only a few parameters to be set, and high robustness. Sopto et al. [23] used the GWO algorithm for numerical optimization in the TSP. The exchange factor and sequence were added in the GWO algorithm. To improve GWO algorithm performance, Xu et al. [24] reconstructed the coding method and target function of the GWO algorithm, and two-opt factor and dynamic, elite strategies were added to obtain a suitable discrete TSP solution. However, the aforementioned GWO algorithms have slow convergence speed and high calculation cost. In addition, solving the TSP provides only an optimization sequence for traversing multiple target points, and for completing multi-target path planning it needs to be combined with other algorithms in the face of unknown obstacles in an actual environment.

Unknown Obstacles Environment
To solve the problem of unknown obstacles in an actual environment, Hossain et al. [25] proposed a method based on bacterial foraging optimization. This method improved the selection criteria of particles, and advanced decision-making strategies were used for particle selection, so the length of the planning path was shortened. To reduce calculation complexity, Hosseininejad et al. [26] used the cuckoo search algorithm for path planning. The dimension of the feature vector was reduced to improve the performance of finding an optimization path, security, and smoothness. Faridi et al. [27] proposed an improved Artificial Bee Colony (ABC) algorithm. The free collocation point and mutation operator were introduced into the ABC algorithm, which effectively improved the obstacle avoidance accuracy in the environment with unknown obstacles. To solve the path planning problem of an Unmanned Aerial Vehicle (UAV), Ma et al. [28] proposed a dynamic augmented multi-target PSO method. The trade-off analysis of different environment targets was implemented to improve the accuracy of the proposed method. To solve the problem of the basic GA in path planning, Liu et al. [29] proposed the concept of visual graphics and safety factors, and the hill climbing algorithm was employed to search better individuals. The dynamic search efficiency of the algorithm was effectively improved. The D* Lite algorithm is an efficient path planning method with the characteristics of flexible change in a dynamic environment. This algorithm has been widely used in an unknown obstacle environment. Huang et al. [30] proposed an improved D* Lite algorithm. This algorithm introduced the concept of a combination of lazy line-of-sight and distance transformation so that the re-planned path can avoid sudden obstacles. To improve the dynamic search efficiency of the D* Lite algorithm, Khalid et al. [31] proposed a concept of predictable obstacles and introduced priority measures to improve the actual search performance of the D* Lite algorithm. However, when an environment map is complex, the above-mentioned D* Lite algorithms have many expanded nodes, so the search efficiency is relatively low, the time cost of the algorithm is high, and the planning path has many inflection points.
Considering the above deficiencies, a hybrid path planning algorithm is proposed in this paper. The proposed algorithm can be divided into two parts. First, based on the environmental map modeling by the grid method, the multi-target planning problem is transformed into a TSP, and the improved GWO algorithm is used to plan a multi-target cruise sequence. Second, based on the obtained sequence, the improved D* Lite algorithm is used for path planning between every two target points. Thus, a circular path that starts from the start point, traverses multiple sampling points, and finally returns to the start point in an unknown obstacle environment is obtained.

Map Construction
The basic principle of the grid method is that an environment map is divided into independent grid units of the same size according to a certain resolution [32]. In an actual environment, each position is represented by a grid, and a grid has the respective status. In this study, the grid method was used to construct the public water environment map, which denotes a two-dimensional coordinate map. In the grid map, the passable area was represented by a white grid, and the obstacle area was represented by a black grid. When the environment changes, for instance, due to the appearance and disappearance of obstacles or their movement, the grid color corresponding to the changed area also changed accordingly.

GWO Algorithm
The GWO is a heuristic intelligent optimization algorithm proposed by Mirjalili in 2014. The hierarchy of the grey wolf and the search process for prey are simulated in this algorithm. In a group of grey wolves, the level, which is determined by the leadership, is divided into four levels: α, β, δ, and ω in a pyramid shape, as shown in Figure 1. In this study, the grid method was used to construct the public water environment map, which denotes a two-dimensional coordinate map. In the grid map, the passable area was represented by a white grid, and the obstacle area was represented by a black grid. When the environment changes, for instance, due to the appearance and disappearance of obstacles or their movement, the grid color corresponding to the changed area also changed accordingly.

GWO Algorithm
The GWO is a heuristic intelligent optimization algorithm proposed by Mirjalili in 2014. The hierarchy of the grey wolf and the search process for prey are simulated in this algorithm. In a group of grey wolves, the level, which is determined by the leadership, is divided into four levels: α, β, δ, and ω in a pyramid shape, as shown in Figure 1. The hierarchy of wolves decreases gradually from top to bottom. Wolves in the upper part have the leadership over wolves in the lower hierarchy, and the wolves in the lower hierarchy generate feedback based on the order of the wolves in the upper hierarchy. By calculating the fitness function of the GWO, the top-three fitness functions are wolves α, β, and δ, and the rest of the wolves are ω. According to the position information provided by wolves α, β, and δ, wolves ω update their position and move to the prey. According to the related literature [22], the mathematical expressions of the search process are as follows: where X  is the position vector of a grey wolf, t is the current iteration, P X  is the position vector of the prey, A = 2a•r1 − a and C = 2•r2 are coefficients, where r1 and r2 are the random numbers in [0, 1], respectively, and a is linearly decreased from 2 to 0 over the course of iterations: where t indicates the current iteration, and MaxIter indicates the total number of iterations.
The other wolves update their positions according to the positions of α, β, and δ as follows: ( ) The hierarchy of wolves decreases gradually from top to bottom. Wolves in the upper part have the leadership over wolves in the lower hierarchy, and the wolves in the lower hierarchy generate feedback based on the order of the wolves in the upper hierarchy. By calculating the fitness function of the GWO, the top-three fitness functions are wolves α, β, and δ, and the rest of the wolves are ω. According to the position information provided by wolves α, β, and δ, wolves ω update their position and move to the prey. According to the related literature [22], the mathematical expressions of the search process are as follows: where → X is the position vector of a grey wolf, t is the current iteration, → X P is the position vector of the prey, A = 2a·r 1 − a and C = 2·r 2 are coefficients, where r 1 and r 2 are the random numbers in [0, 1], respectively, and a is linearly decreased from 2 to 0 over the course of iterations: where t indicates the current iteration, and MaxIter indicates the total number of iterations. The other wolves update their positions according to the positions of α, β, and δ as follows: The pseudo code of GWO is detailed in literature [22].

D* Lite Algorithm
The D* Lite algorithm is a path planning algorithm proposed by Koenig and Likhachev. It is based on the LPA* algorithm [33]. The D* Lite algorithm represents a heuristic algorithm that can solve the path planning problem in an unknown obstacle environment. Unlike the forward search method of the LPA* algorithm, the D* Lite algorithm uses the reverse search method. Due to the incremental planning, the D* Lite algorithm has a short re-planning time, which makes it suitable for environment maps with unknown obstacles.
Assume that S denotes the finite set of nodes in a graph, and Succ ⊂ S denotes the set of successors of a node s, s∈S. The path cost rhs(s) from the current node s to the goal node s goal is calculated by: where c(s, s') denotes the cost of moving from node s to node s'∈Succ(s); g(s') is the actual path cost from the current extension node s' to node s goal . rhs(s) is updated earlier than g(s), and all rhs(s) of the expanded nodes are updated as obstacles appear or disappear, but not all g(s) of the nodes need to update with rhs(s). A detailed description of the above variables can be found in literature [34].

Problem Formulation
The aim of multi-target path planning is to find a path for a UCS that traverses all non-repeating target points. The UCS starts from the start point, traverses all target points, and finally returns to the start point; the path is a circular route. In actual lakes, there are many unknown obstacles, such as ships, moving creatures, floating objects, and various submerged reefs. Therefore, a UCS needs to detect these unknown obstacles and avoid them in time; meanwhile, the planning path needs to be as short as possible to reduce the probability of accidents.
The overall framework of multi-target path planning consists of two modules. The first module aims to obtain a multi-target cruise sequence, which provides a path traversing all non-repeating target points with the criterion of minimum path length. In this process, the unknown obstacles are not considered. Based on the planned target sequence, the second part re-plans the obtained path between every two target points independently using the criterion of the minimum path length and constraint of unknown obstacle avoidance, so a closed path that can guide a UCS traversing all target points safely through an unknown obstacle environment is obtained.
Assume a set of d target points, D = {1, 2, . . . , d}. The element in D represents the serial number of d target points. The number of permutations of elements in D is d!. Let I = {l 1 , l 2 , . . . , l d } be a permutation of D. A set of all permutations for d target points is denoted as V = {I 1, I 2 , . . . , I d! } and an element in V represents a cruise sequence for d target points. The object is to find a set I i (i∈{1,2, . . . , d!}) to minimize the following objective function F(I): where L(l i , l j ) is the Euclidean distance between targets l i and l j , and the obtained I i is the optimized target cruise sequence. Then, based on I i , the path between every two target points is re-planned based on the constraint of unknown obstacle avoidance. This paper considered an area of unknown obstacles, such as ships, moving creatures, floating objects, and various submerged reefs, as a forbidden area that a UCS must avoid. S f is a set of the paths through the forbidden area. The closed-path P is expressed as follows: where P(l i , l j ) is the re-planned path between targets l i and l j , and 1 ≤ i ≤ d − 1.

Improved GWO
According to the problem formulation, the considered multi-target path planning problem is similar to the TSP, so it can be transformed into the TSP. The GWO algorithm has been widely used for solving the TSP due to the advantages of few parameters and easy implementation, but it suffers from slow convergence speed. Therefore, this paper first constructed a multi-target coding method according to the optimization path planning requirements of traversing multiple target points. Then, to solve the problem of a slow convergence speed of the GWO algorithm, the convergence factor was improved to increase the convergence speed.

Multi-Target Encoding Construction
The solution range of the traditional GWO algorithm is a two-dimensional continuous space, and the optimization solution can be obtained by determining the value ranges of the target function and independent variables. To solve the TSP of a multi-target cruise sequence, it is necessary to construct a multi-target coding method suitable for the GWO algorithm.
The parameters of the GWO algorithm are as follows: n is the number of grey wolves, d is the number of target points, and X i is the position sequence of the ith grey wolf traversing d target points. The formula of X i is as follows: where x id implies the ith grey wolf located at the dth target; n grey wolves search the optimization sequence in the d-dimensional space, and a spatial domain matrix X of n × d can be obtained as follows: where x im implies the ith grey wolf located at the mth target, and X i is the ith row of matrix X and it can be regarded as a sequence to visit d target points of the ith wolf. When the number of target points is d, the distance between each target can be represented as a d-dimensional matrix P, which is expressed as: where s jm denotes the Euclidean distance between the jth and mth targets. The diagonal points of the matrix denote the distance from each target to itself, which is why the values of the diagonal points are all zero. After constructing the distance matrix P, it is necessary to construct a fitness function f, which represents the shortest sum of distances. The lower the value of the fitness function f is, the better the traversal sequence of the target points will be. The fitness function f is defined as: where d ∑ m=1 s jm is the sum of distances between every two target points on the path of the wolf in the jth row. When the GWO algorithm reaches the maximum number of iterations, the sequence X i with the lowest fitness function value is selected as an optimization sequence.

Convergence Factor Improvement
For swarm intelligence algorithms such as the GWO algorithm, it is very important to improve the convergence speed. According to Equations (1) and (2), the convergence factor a directly affects the convergence speed. a in the traditional GWO algorithm can be expressed in the form of a linear function; it decreases from 2 to 0 as the number of iterations increases. When the maximum number of iterations is reached, the value of a is 0, as shown in Figure 2. As the number of iterations increases, a is presented as a linear descending line, so the convergence speed of the GWO algorithm is slow. Therefore, this study aims to improve the convergence speed of a to increase the convergence speed of the GWO algorithm. Step 4 Construct fitness function f using Equations (12) and (13) Step 5 Calculate and update the target sequence using Equations (2)- (6) and (10)- (14), respectively Step 6 if number of iterations < n Step 7 Repeat Steps 5-7 Step 8 else Output the target sequence As shown in Figure 2, the improved expression of a is a nonlinear decreasing curve. As previously mentioned, a tended to be 0 when the maximum number of iterations was reached. The convergence speed of a was faster than that in the traditional GWO algorithm. Therefore, the convergence speed of the GWO algorithm can be increased, and the operation time of the algorithm can be reduced. According to Figure 2, the adjustment factor (λ1 = 1, λ2 = 0.1) corresponding to the curve with the best convergence is chosen.

Improved D* Lite Algorithm
The traditional D* Lite algorithm is a heuristic algorithm based on the reverse search. The advantage of this algorithm is that it can use previously searched path information to improve the efficiency of the current search, so the calculation burden is reduced. When an environment changes, only the heuristic value and the path cost from the target point to the new start point should be updated, so this algorithm can adapt well to the environment map with unknown obstacles. However, in a complex environment map, due to the rapid increase of the number of expanded nodes, the D * Lite algorithm takes a lot of time, which leads to the low efficiency of the algorithm. Meanwhile, there are many inflection points in the planning path, which is not conducive to the actual cruise. To overcome the two problems, in this study, the heuristic function in the D* Lite algorithm was improved to reduce the number of expanded nodes, so the search efficiency was improved. At the same time, the planning path was smoothed. In order to make the initial value of a tend to 0, on the basis of the beta function [35], a can be expressed as: where t is the current number of iterations, t max is the maximum number of iterations, B(*) is beta function, µ 1 and µ 2 are the position adjustment factors, and λ 1 and λ 2 are the speed adjustment factors. According to the related literature [36] the specific implementation process of the improved GWO Algorithm is as follows. Algorithm: The Improved GWO Algorithm.
Step 1 Initialize parameters Step 2 Construct matrix X using Equation (11) Step 3 Construct matrix P and calculate d ∑ m=1 s jm Step 4 Construct fitness function f using Equations (12) and (13) Step 5 Calculate and update the target sequence using Equations (2)- (6) and (10)- (14), respectively Step 6 if number of iterations < n Step 7 Repeat Steps 5-7 Step 8 else Output the target sequence As shown in Figure 2, the improved expression of a is a nonlinear decreasing curve. As previously mentioned, a tended to be 0 when the maximum number of iterations was reached. The convergence speed of a was faster than that in the traditional GWO algorithm. Therefore, the convergence speed of the GWO algorithm can be increased, and the operation time of the algorithm can be reduced. According to Figure 2, the adjustment factor (λ 1 = 1, λ 2 = 0.1) corresponding to the curve with the best convergence is chosen.

Improved D* Lite Algorithm
The traditional D* Lite algorithm is a heuristic algorithm based on the reverse search. The advantage of this algorithm is that it can use previously searched path information to improve the efficiency of the current search, so the calculation burden is reduced. When an environment changes, only the heuristic value and the path cost from the target point to the new start point should be updated, so this algorithm can adapt well to the environment map with unknown obstacles. However, in a complex environment map, due to the rapid increase of the number of expanded nodes, the D* Lite algorithm takes a lot of time, which leads to the low efficiency of the algorithm. Meanwhile, there are many inflection points in the planning path, which is not conducive to the actual cruise. To overcome the two problems, in this study, the heuristic function in the D* Lite algorithm was improved to reduce the number of expanded nodes, so the search efficiency was improved. At the same time, the planning path was smoothed.

Heuristic Function Improvement
The D* Lite algorithm introduces an evaluation function k(s). The node expands in the priority queue with the smallest k(s). The k(s) contains two components [k 1 (s); k 2 (s)] as follows: where h(s start, s) is the heuristic function that represents the path cost from the start node s start to the node s. k(s) is compared according to a lexicographic ordering. For example, k(s) is less than or equal to k'(s), denoted by k(s) ≤ k'(s), if either k 1 (s) < k 1 (s) or (k 1 (s) = k 1 (s) and k 2 (s) ≤ k 2 (s)). k m is the superposition of node moving distance and k m := k m + h(s last , s start ). It is a variable that updates with a change in the environment. It has been shown that the heuristic function h(s) directly affects the evaluation function k(s); the mathematical expressions of h(s start, s) are as follows: where h(s', s goal ) is the cost function from a node s' to the node s goal .
In the D* Lite algorithm, the number of expanded nodes directly affects its search efficiency. The evaluation function k(s) determines whether a node is expanded. When expanding a node, according to Equation (15), k 1 (s) of adjacent nodes will be compared preferentially, and k 1 (s) contains the heuristic function. Therefore, the search efficiency of the algorithm is directly affected by the heuristic function.
Because the heuristic function h(s) in the traditional D* Lite algorithm uses the chessboard distance, when expanding nodes near the goal node, it is easy to have multiple nodes with the same value of k 1 (s). In Figure 3, the black grid represents an obstacle node, and the light grey grid represents the current expanded node. When expanding a node from the goal node, the first step is to initialize the node information. In Step 1, E3 is selected as a goal node to expand its adjacent nodes, and there are three nodes with the same value of k 1 (s), which are denoted as D2, D3, and D4, and k 1 (s) values of D2, D3, and D4 are all the smallest value. Thus, the three nodes all need to be expanded gradually. After Step 2, four nodes need to expand, which is represented by the grey grid. It can be observed that there are at least four steps needed to expand the node at layer D. After expanding the three nodes, there are three nodes with the same value of k 1 (s), which are nodes C1, C2, and C3, and they are all the smallest among all surrounding expanded nodes. Therefore, the same steps are repeated. If many multiple nodes are the same, more nodes need to be expanded, which will increase the calculation time and reduce the search efficiency of the algorithm. where x and y are the horizontal and vertical coordinates of the current node, respectively; xstart and ystart are the horizontal and vertical coordinates of the start node, respectively; xgoal and ygoal are the horizontal and vertical coordinates of the goal node, respectively; w is the weight factor, and its range is [1,2]. Step1  The specific process of the improved D* Lite algorithm is presented in Figure 4. First, the node information is initialized, then the adjacent nodes of node E3 are expanded. In Step 1, the node information is obtained by Equation (17). The node with the smallest value of k1(s) is node D2, so the other two nodes, nodes D3 and D4, are no longer needed to be expanded, and only one step is needed to expand the node at layer D. In Step 2, node D2 is selected for expansion. Based on the expansion result, a node with the smallest value of k1(s) is node C1, and there are no other nodes with the same values. Therefore, after Step 2, the node expansion at layer D is completed and there are only two expansion nodes (grey grid). However, by the traditional D* Lite algorithm in Figure 3, the number of expansion nodes is four, so the improved D* Lite algorithm of the heuristic function reduces two expanded nodes, and therefore has a higher search efficiency.
The comparison results of the traditional D* Lite algorithm and the proposed D* Lite algorithm were shown in Figure 5, where the black grid represented an obstacle, the grey grid represented the expanded node, and the red grid represented the final path. It was shown that, in the case of the same obstacles, although the planning paths of the two To reduce the number of expanded nodes and increase the search efficiency of the algorithm, according to the related literature [37], an improved heuristic function is proposed. The specific improvements are as follows: horizontal or vertical unit movement cost is defined as one, and the diagonal unit movement cost is defined as √ 2. On this basis, a weight function is added, wherein the greater the value of the weight function is, the farther a node will be from the start node. Therefore, values of k 1 (s) and k 2 (s) of multiple nodes are less likely to be equal, the number of expanded nodes is reduced, and the search efficiency is improved. The improved heuristic function h' is defined as follows: where x and y are the horizontal and vertical coordinates of the current node, respectively; x start and y start are the horizontal and vertical coordinates of the start node, respectively; x goal and y goal are the horizontal and vertical coordinates of the goal node, respectively; w is the weight factor, and its range is [1,2]. The specific process of the improved D* Lite algorithm is presented in Figure 4. First, the node information is initialized, then the adjacent nodes of node E3 are expanded. In Step 1, the node information is obtained by Equation (17). The node with the smallest value of k 1 (s) is node D2, so the other two nodes, nodes D3 and D4, are no longer needed to be expanded, and only one step is needed to expand the node at layer D. In Step 2, node D2 is selected for expansion. Based on the expansion result, a node with the smallest value of k 1 (s) is node C1, and there are no other nodes with the same values. Therefore, after Step 2, the node expansion at layer D is completed and there are only two expansion nodes (grey grid). However, by the traditional D* Lite algorithm in Figure 3, the number of expansion nodes is four, so the improved D* Lite algorithm of the heuristic function reduces two expanded nodes, and therefore has a higher search efficiency.    The comparison results of the traditional D* Lite algorithm and the proposed D* Lite algorithm were shown in Figure 5, where the black grid represented an obstacle, the grey grid represented the expanded node, and the red grid represented the final path. It was shown that, in the case of the same obstacles, although the planning paths of the two methods were the same, the number of expanded nodes in the improved D* Lite algorithm were obviously less than in the traditional D* Lite algorithm. Thus, the search efficiency was improved, and the calculation time of the algorithm was effectively reduced.

Path Smoothing
When the traditional D* Lite algorithm is used for path planning, there are many inflection points in the planned path. The inflection points will increase the path length, as well as the control difficulty and power consumption of the UCS. Therefore, it is necessary to smooth the planning path by eliminating inflection points. According to the related literature [38], the specific implementation process of path smoothing is as follows.

Path Smoothing
When the traditional D* Lite algorithm is used for path planning, there are many inflection points in the planned path. The inflection points will increase the path length, as well as the control difficulty and power consumption of the UCS. Therefore, it is necessary to smooth the planning path by eliminating inflection points. According to the related literature [38], the specific implementation process of path smoothing is as follows.

Algorithm : Path smoothing
Step 1 Label each point on the planning path from one to n Step 2 Connect points 1 and 2 and check whether the connection passes through the obstacles Step 3 Check until the connection between points 1 and k (k < n) passes through the obstacle Step 4 Connect point 1 and (k − 1) and replace the previous path from point 1 to (k − 1) Step 5 Use point (k − 1) as a new start point and repeat the above steps until the target is reached Based on Sections 4.2.1 and 4.2.2., according to the related literature [33], the specific implementation process of the improved D* Lite Algorithm is as follows. Step 1 Label each point on the planning path from one to n Step 2 Connect points 1 and 2 and check whether the connection passes through the obstacles Step 3 Check until the connection between points 1 and k (k < n) passes through the obstacle Step 4 Connect point 1 and (k − 1) and replace the previous path from point 1 to (k − 1) Step 5 Use point (k − 1) as a new start point and repeat the above steps until the target is reached Based on Sections 4.2.1 and 4.2.2, according to the related literature [33], the specific implementation process of the improved D* Lite Algorithm is as follows.
Step 1 Parameter initialization Step 2 Expand adjacent nodes from s goal Step 3 Compare current k(s) values and select the node with the smallest k(s) as the next expanded node Step 4 Expand the nodes constantly until reach s start Step 5 Calculate the values of rhs(s) and move to the node with the smallest rhs(s)

Step 6
If the surrounding environment has changed Step 7 update adjacent nodes and return to Step 2 Step 8 else the current node is the new start node s' start Step 9 If node s' start is node s goal Step 10 Perform path smoothing Step 11 else return to Step 2 Step 12 Complete path planning between every two target points

Algorithm Overview
The specific process of the proposed hybrid multi-target path planning algorithm is as follows. First, a UAV is used to obtain an image of the water environment, which is then transformed into a two-dimensional coordinate map by the grid method, and the coordinates of multiple target points are set on the map. Second, the proposed improved GWO algorithm is used to obtain the sequence of multiple target points, and the serial number of target points is marked on the grid map according to the planned sequence. Third, the improved D* Lite algorithm is used to calculate a path between every two target points in the grid map, and the planning path is smoothed. Finally, a closed path, which starts from the start point, traverses multiple target points, and returns to the start point, is obtained.
Algorithm: The proposed multi-target hybrid path planning algorithm.
Step 1 Parameter initialization Step 2 Introduce improved convergence factor Step 3 Calculate fitness function Step 4 Determine target sequence of α, β, δ, and ω Step 5 If the maximum number of iterations is reached Step 6 Output the target sequence Step 7 else Update adjacent nodes and return to Step 2 Step 8 Perform path planning between two target points by the improved D* Lite algorithm Step 9 If the surrounding environment has not changed Step 10 If the new start point s' start is the target point Step 11 If it is the final target point

Simulation Experiments
To verify the advantages of the proposed hybrid algorithm, several simulation experiments were performed. Windows 10 was used as an operating system and MATLAB R2017b as a simulation tool. The hardware platform was an Intel Core E5-2620 V3 processor with a frequency of 2.4 GHz and a memory of 32 GB. The simulation environment map was created based on the public water area of Douhu, southwest of Hongze Lake, located in Jiangsu Province. The simulation environment map was created by grid method [32]. The simulation experiments were divided into two groups, experiments in ordinary and complex environments, mainly based on the grid map size and the number of target points.

Simulations in Ordinary Environments
A 50 × 50 grid map model was created in an ordinary environment. The scale of the map was one, and a grid length corresponded to 10 m in the actual environment. The UCS used a laser detector that could detect the surrounding obstacles in time, and the detection distance was 5 × 5 grid. The speed of the UCS was 10 m/s. The target coordinates were set randomly; four sets of the target coordinates were randomly chosen, and they were denoted as Cases 1-4 in Table 1. The number of target points was 10, and the serial number of the start point was marked as one. In the map, the randomly-distributed obstacle density was 10%, and the settings were as follows: each grid had two states, white or black, where the white grid represented the passable area, and the black grid represented the obstacle area. The state of each grid had a certain probability of changing. In this experiment, the change probability of the grid state was set to 3%, so when the UCS moved to a grid, the probability of a white grid changing to a black grid was 3%, and the probability of a black grid changing to a white grid was also 3%.

Case
Target Coordinate 1 (4, 46), (8,28), (6,19), (14,10), (17,22), (26,29), (29,11), (41,4), (39,26), (37,41) 2 (37,26), (38,3), (26,9), (11,8), (2,18), (3,24), (6,27), (16,28), (25,32), (36,30) 3 (17,23), (31,22), (8,9), (11,11), (19,18), (5,14), (38,17), (26,18), (14,33), (35,37) 4 (26,24), (13,31), (6,9), (15,7), (12,28), (33,24), (6,4), (36,33), (16,37), (6,13)  To verify the performance of the proposed hybrid algorithm, comparative experiments with the other four algorithms were performed. The comparison algorithms were denoted as Algorithms 1-4. Algorithm 1 used the ACO to determine the cruise sequence of target points. According to the planned sequence, the ACO was used again to achieve path planning between two target points. Similarly, Algorithm 2 used the GA to complete the multi-target path planning. Algorithm 3 used the CMPSO to complete the multi-target path planning. Algorithm 4 used the traditional GWO algorithm-traditional D* Lite algorithm to complete the multi-target path planning. The proposed hybrid algorithm used the improved GWO algorithm to determine the cruise sequence of target points, and the improved D* Lite algorithm was used for the single-target path planning. According to [39], the parameters of the ACO in Algorithm 1 were set as follows: the information priming factor was set to α = 1, the expected heuristic factor was set to β = 5, the volatilization factor was set to ρ = 0.1, the pheromone intensity was set to Q = 1, the number of ants was set to m = 200, and the maximum number of iterations was set to k = 100. According to [40], the parameters of the GA in Algorithm 2 were set as follows: the maximum number of evolution times was set to max = 50, the crossing probability was set to pc = 0.8, the mutation probability was set to pm = 0.2, the path length proportion was set to a = 1, and the path smoothness proportion was set to b = 7. According to [20], the parameters of the CMPSO in Algorithm 3 were set as follows: the inertia weight was set to ω = 0.4, and the learning factor was set to c 1 = c 2 = 2. The simulation parameters of Algorithm 4 and the proposed hybrid algorithm were presented in Table 2. The adjustment factors λ 1 and λ 2 were chosen according to the better convergence curve through a series of experiments, as shown in Figure 3. The five algorithms were used for comparison experiments in Case 1, and when the UCS returned to the start point, the obtained simulation results were shown in Figures 6-10. All five algorithms could complete the multi-target path planning; the planned target sequence was the same, but the planning path was different. The red numbers in Figures 6-15 represent the traversal sequence between every two target points. The detection area in path planning in Figure 10 was obviously smaller than that in Figure 9, which indicated the effect of the proposed algorithm.
10. All five algorithms could complete the multi-target path planning; the planned target sequence was the same, but the planning path was different. The red numbers in Figures  6-15 represent the traversal sequence between every two target points. The detection area in path planning in Figure 10 was obviously smaller than that in Figure 9, which indicated the effect of the proposed algorithm.          To avoid contingency and verify the robustness of the proposed algorithm, 50 simulation experiments were performed for Case 1 using the five algorithms in Table 3. According to the related literature [41], the t-test was used to verify whether the proposed hybrid algorithm significantly improved the path planning performance. The data of the          To avoid contingency and verify the robustness of the proposed algorithm, 50 simulation experiments were performed for Case 1 using the five algorithms in Table 3. According to the related literature [41], the t-test was used to verify whether the proposed hybrid algorithm significantly improved the path planning performance. The data of the proposed hybrid algorithm were taken as the total samples, the other four algorithms were used as test samples, and the significance difference was 0.05. When the t-test value was less than 0.05, it meant that the test performance of the algorithm was significantly different from that of the proposed algorithm, and the significant improvement was indicated by '+'. In addition to the planning time and length, the number of inflection points was also used as an evaluation metric of algorithm performance since it can reflect the smoothness of the path. Too many inflection points would increase the UCS's energy consumption and reduce its safety. The statistical analysis of the results are shown in Table 3, where it can be seen that the proposed hybrid algorithm always found an optimal solution and converged to the stable state, and its performance was better than those of the other four comparative algorithms. In addition, to verify the efficiency and generalizability of the proposed algorithm, the other three cases were simulated using different target coordinates. The algorithm performance comparison was given in Table 4, where it can be seen that in terms of planning time, planning length, and the number of inflection points, the proposed hybrid algorithm performed better than the other four comparative algorithms. Consequently, the proposed hybrid algorithm had stronger applicability and higher performance than the other four algorithms.
Fifty simulation experiments of the five algorithms in Case 1 were performed in complex environments. The statistical analysis of the experimental results was shown in Table 6. The results in Table 6 further validated the robustness of the proposed algorithm in complex environments. In the t-test, the proposed algorithm performed significantly better than the other four hybrid algorithms. Compared with the ordinary environments, the proposed hybrid algorithm showed more obvious advantages in complex environments. The other three cases were also simulated using different target coordinates, and the results were presented in Table 7. Similarly, the performance of the proposed hybrid algorithm was better than of the other four comparison algorithms.

Performance Testing of the Proposed Algorithm
To investigate the performance of the proposed algorithm further, according to [42], four benchmark test functions were chosen for comparison experiments, and they have been given in Table 8.

Unimodal function
Sphere Multimodal function The four benchmark test functions were solved by the Improved Grey wolf optimization algorithm (IGWO) and compared with the numerical calculation results of the Ant colony optimization, Genetic algorithm, Chaos multi-population particle swarm optimization, and Grey wolf optimization algorithms. The parameters of these algorithms were the same as in the above-mentioned experiments. The experiments with each of the functions were run independently 30 times, and the mean and standard deviation of the algorithms were recorded. The comparison results of five algorithms for the optimization of f 1 -f 4 were shown in Table 9. For function f 3 , the results of the proposed Improved Grey wolf optimization algorithm tended to converge to the theoretical optimal value of zero. For functions f 1 , f 2 , and f 4 , the experimental results of the Improved Grey wolf optimization algorithm were also close to the optimal value. In terms of solution accuracy (mean value) and robustness (standard deviation), the performance of the proposed Improved Grey wolf optimization algorithm was significantly superior to those of the other four algorithms.
The iterative average convergence curves of the five algorithms for the four functions are shown in Figure 16, where it can be seen that the IGWO had a faster convergence rate than the other algorithms.  The iterative average convergence curves of the five algorithms for the four functions are shown in Figure 16, where it can be seen that the IGWO had a faster convergence rate than the other algorithms.

Conclusions
In this paper, a hybrid algorithm based on the improved GWO-D* Lite algorithm was proposed. First, the multi-target planning problem was transformed into a TSP, and the improved GWO algorithm was used to plan a multi-target cruise sequence. Second, based on the obtained sequence, the improved D* Lite algorithm was used for path planning between every two target points. The simulation verification of the proposed algorithm was conducted in both ordinary and complex environments. The comparative simulation experiments with the other four algorithms were also implemented. To avoid contingency and verify the efficiency and generalizability of the proposed algorithm, a detailed statistical analysis based on multiple experiments and a performance comparison with different target coordinates were performed. The simulation results show that, in terms of planning time, planning distance and number of inflection points, the proposed algorithm has obvious advantages over the other four algorithms. Meanwhile, the results of four standard test functions show that the proposed algorithm has strong optimization ability. The proposed algorithm can provide important guidance for multi-target path planning of UCS in an unknown obstacle environment and promote the development of intelligent technology of UCS.
In future research, the proposed algorithm could be improved in two aspects. First, a number of the proposed algorithm's parameters were obtained by experience, so they were influenced by subjective factors. Therefore, these parameters could be optimized by certain methods. Second, the grid map was a 2D model, so it had limitations in reflecting an actual environment. In future research, a 3D map model could be built for an actual environment, which can more accurately reflect the actual shape of a UCS and obstacles. In addition, the currents in the lake, the health status, dynamic behavior, and motion limitation of a UCS could also be constraints of the path optimization problem.